1 Introduction

This report was generated using the miND® analysis pipeline v1.5.2 (Diendorfer et al. 2022). Additional data (fastq raw files, mapping details, etc.) can be made available upon request, but are not included in this report due to file size restrictions.

  • Project ID: PXXX
  • Customer: Customer Name
  • TAmiRNA project manager: Linda Prähauser ()
  • Report generated: 2025-01-27 (09:25:05 GTM +0000) by system user “root”

Comment:


Plasma Biomarker Discovery


Analysis parameters:

  • Species: Homo sapiens (hsa, TXID: 9606)
  • Sequencing adapter: RealSeq (cutadapt -a Realseq3P=TGGAATTCTC -u1)
  • Minimum read length: 17nt
  • Reads quality cutoff: 30 (phred quality score)
  • Significance level: 0.05
  • miND® spike-in version: Vers3 (Lot: MSY2021 (Reconstitution: 2403))

Tabular data can be filtered or sorted using the fields and options at the top of each table. To export the data for further processing, please select the desired format (Excel or CSV) at the table.

2 Data exploration

The first part of this report aims to give a general overview of the sequencing data. Please be aware, that any downstream analysis depends on certain assumptions on the distribution and quality of the data. It is important to manually evaluate the data with the plots and tables provided in this section. Samples that do not pass those evaluations should be excluded from statistical analysis, as they can distort the results.

2.1 Sample table

2.2 Raw data quality control

To evaluate the quality and check the data for common sequencing problems, all processed files are also analysed with the “fastQC” tool. The results of all samples are then combined into one report together with statistics about the adapter trimming step.

The multiQC report was provided alongside with this file (multiqc_report.html).

We recommend to check the multiQC report prior to the interpretation of the results. Any samples that do not pass the manual evaluation of this step should be excluded from further processing and analysis, as they could distort the results.

2.3 Reads classification

Reads classification gives insights into the type and origin (i.e. composition) of all sequences obtained for each sample. After processing of the reads (adapter trimming, quality filtering, size filtering), all remaining reads are mapped against various databases to categorize them. This is done in a hierarchical process, where reads are first mapped against the genome. Genome mapped reads are then mapped against known miRNA sequences of the selected species, and only those not identified as miRNAs get mapped against other databases (RNAcentral filtered for entries of the defined species) for further classification.

“Unclassified genomic” indicates reads that were mapped against the genome but were not found in any of the RNA specific databases, while “unmapped” are reads that could not be found in the given reference genome.

The “Relative reads” tab shows the same data scaled to 100% to indicate the relative abundance of each read classification in a given sample.

You can double click on any of the RNA categories in the legend to hide all other and only show this one category.

2.3.1 Absolute reads

2.3.2 Read composition (relative)

2.3.3 Mapping statistics

The following histograms show the number (y-axis) of genome mapped reads against their length (x-axis) for each sample. The stacked bar charts visualize the proportions of unmapped and mapped reads and can be used to evaluate the read quality. Most microRNAs are 22 nucleotides long.

2.4 Read classification table

The data in this table are equivalent to the data shown in the reads classification graph above (absolute reads). These are raw read counts (without any normalization).

2.5 miRNA quantification

2.5.1 miRNA RPM

This table provides normalized data for all miRNAs identified in each sample. Read counts are normalized to 1 million mapped miRNAs (RPM).

Please use the download link provided underneath the table to save the miRNA mappings data. The buttons provided at the top of the table can also be used, but won’t include detailed group information of the samples.

Download extended miRNA mapping table (RPM)

2.5.2 miRNA raw reads

This table provides normalized data for all miRNAs identified in each sample. These are raw read counts (without any normalization).

Please use the download link provided underneath the table to save the miRNA mappings data. The buttons provided at the top of the table can also be used, but won’t include detailed group information of the samples.

Download extended miRNA mapping table (raw reads)

2.5.3 Identified miRNAs comparison

This graph shows the amount of distinct mature miRNAs identified in each sample.

Download identified miRNAs comparison data

2.5.4 miRNA read count distribution

This overview plots the abundance of a miRNA (collapsed read count) on the x axis and the amount of other miRNAs in this range on the y axis. It illustrates the distribution of miRNAs in the sample.

2.6 miND® spike-in analysis

miND® spike-ins (Khamina et al. 2022) were used to calculate absolute concentrations of miRNAs detected in the NGS data. In order to calculate concentrations of miRNAs, a linear regression model (y ~ 0 + x) is used. Prediction intervalls are provided in addition to the estimated concentrations in the downloadable data below. Please make sure to check that the selected model is an appropriate choice for the given data and consider prediction intervals when interpreting the results.

2.6.1 Predicted miRNA concentrations [molecules/uL] and prediction intervalls

This table contains all identified miRNAs in each sample. miRNA reads are converted to absolute molecules/uL based on NGS spike-in calibrators.

Please use the download link provided underneath the table to save the miRNA mappings data. The buttons provided at the top of the table can also be used, but won’t include detailed group information of the samples.

Download miRNA absolute quantification table (concentration and prediction intervall)

Download miRNA absolute quantification table (concentration)

Download miRNA absolute quantification table (prediction intervals only)

2.6.2 miND® spike-in quality control

The ‘Calibration QC’ column contains information about the quality control evaluation based on the miND® spike-ins (Khamina et al. 2022). For a sample to pass the calibration QC checks, the following must be true:

  • 5 or more miND® spike-in core sequences detected
  • linear model parameters calculated
  • Pearson correlation coefficient (r-squared) above 0.95

2.6.2.1 Overview

2.6.2.2 10/CT/AKS

2.6.2.3 12/CT/AKS

2.6.2.4 18/CT/AKS

2.6.2.5 20/CT/AKS

2.6.2.6 28/CT/AKS

2.6.2.7 32/CT/AKS

2.6.2.8 ID_173

2.6.2.9 ID_175

2.6.2.10 ID_176

2.6.2.11 ID_178

2.6.2.12 ID_196

2.6.2.13 ID_198

2.6.2.14 ID_199

2.6.2.15 4/CT/AKS

2.6.2.16 6/CT/AKS

2.6.2.17 16/CT/AKS

2.6.2.18 51/CT/AKS

2.6.2.19 68/CT/AKS

2.6.2.20 76/CT/AKS

2.6.2.21 92/CT/AKS

2.6.2.22 104/CT/AKS

2.6.2.23 169/CT/AKS

2.6.2.24 44/CT/SAP

2.6.2.25 46/CT/SAP

2.6.2.26 48/CT/SAP

2.6.2.27 56/CT/SAP

2.6.2.28 58/CT/SAP

2.6.2.29 62/CT/SAP

2.6.2.30 71/CT/SAP

2.6.2.31 76/CT/SAP

2.6.2.32 85/CT/SAP

2.6.2.33 92/CT/SAP

2.6.2.34 97/CT/SAP

2.6.2.35 99/CT/SAP

2.6.2.36 103/CT/SAP

2.6.2.37 105/CT/SAP

2.6.2.38 119/CT/SAP

2.6.2.39 121/CT/SAP

2.6.2.40 125/CT/SAP

2.6.2.41 128/CT/SAP

2.6.2.42 130/CT/SAP

2.6.2.43 133/CT/SAP

2.6.2.44 139/CT/SAP

2.6.2.45 141/CT/SAP

2.6.2.46 142/CT/SAP

2.6.2.47 147/CT/SAP

2.6.2.48 153/CT/SAP

2.6.2.49 157/CT/SAP

2.7 Heatmaps

Data is based on RPM normalized reads and scaled using the unit variance method for visualization in heatmaps. Clustering is done using the average method of pheatmap calculating the distances as correlations.

2.7.1 Top 50 miRNAs

This heatmap shows only the top 50 miRNAs (based on coefficient of variation (CV%)). An additional filter was introduce to increase the robustness: only miRNAs that show an RPM in at least 1 / n(groups) percent of samples (e.g. with 4 groups, the miRNA has to have an RPM value above 5 in at least 25% of the samples). This removes miRNAs that have a high CV but are only expressed in a too small amount of samples to bear any statistical significance or biological relevance.

Download data used to generate the heatmap

2.7.2 All miRNAs

326 miRNAs are shown in the following heatmap, based on the same filters described at the top 50 miRNAs.

Download data used to generate the heatmap

2.8 PCA

Principal component analysis (PCA) uses RPM normalized miRNA reads and reduces the data dimensions down to two, so that it can be plotted in a graph. A quick introduction to PCA plots and the underlaying principle, can be found here.

Samples are either colored by their first group or by the cluster they were assigend to. Clustering is done using the ward (ward.D2) alrogithm of hclust (split at euclidian cluster height of 40).

2.8.1 PCA cluster by sample groups

2.8.2 PCA cluster by hierachical clustering

2.9 t-SNE

t-SNE is a nonlinear dimensionality reduction technique well-suited for embedding high-dimensional data for visualization in a low-dimensional space (like 2 dimensions here). It models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability. More details can be found in the author’s publication (Maaten and Hinton 2008).

3 Differential expression analysis

Differential expression analysis uses statistical tests to find miRNAs that are over or underexpressed in a group. For this report, the well established analysis toolkit edgeR (Robinson, McCarthy, and Smyth 2009) was used.

Annotations in this result are standardized, as that for a contrast of GroupA vs. GroupB a positive logFC indicates that the miRNA is upregulated in groupA. E.g. a logFC of 2.5 equals an increase of miRNA by the factor of 2^2.5 = 5.66.

Please select a contrast below to view the differential expression analysis results.

3.1 Disease_A versus Control

3.1.1 Sample overview

The following two tables give a quick overview of the samples that were part of the two groups compared in this contrast.

3.1.1.1 Samples group A

3.1.1.2 Samples group B

3.1.1.3 Independent filtering

As hundreds or even thousands of miRNAs are tested for each contrast, multiple testing adjustment is required to reduce the false discovery rate (FDR). This is traditionally done using p-value adjustment methods like Benjamini Hochberg (BH) with an arbitrary cutoff for low expressed miRNAs prior analysis. In this case, the BH method reduces the amount of false positives reliably, but at the same time, removing a great amount of valid observations. In addition, the cutoff for low expressed miRNAs might remove biologically relevant observations.

Filtering of reads should be done independent of the group assignments. This is to avoid the introduction of any bias for the downstream differential expression analysis.

In order to give the highest sensitivity to our analysis, we have implemented a method of removing low read count miRNAs from the data set until a statistically relevant set of significant results remains. This approach of independent filtering is also used by DESeq2 and provides the currently best established filtering method prior FDR adjustment. Assuming that most false-positives are caused by low abundant miRNAs, the algorithm removes quantiles of miRNAs from the low-abundance end and checks if the amount of significant miRNAs increases after BH adjustment. This would be the case if mostly false positives have been removed because BH adjustment would now be more sensitive and not remove as many true positives, increasing the overall amount of significant results.

This method works reliably as long as there are any true positive results. If the result set consist only of false positives, then even after removing the low abundant results would not increase the amount of significant results (as there are no true positives to enrich). In this case the algorithm has a fallback, to filter for lowly expressed miRNAs prior DE and FDR adjustment: In a first step, we filter out miRNAs that are only expressed on very low levels: RPM smaller than 10 divided by the smallest library size in at least half the amount of samples of the smaller group. Those miRNAs carry no biological and statistical relevance (Chen, Lun, and Smyth 2016) as they have very low read counts in both groups.

This plot visualizes the independent filtering method based on significant observations used for this contrast. The quantile of reads removed prior BH p-value adjustment is ploted on the x axis, while the amount of significant observations is shown on the y axis. The algorithm aims to optimize for the maximum amount of significant observations and picks the apropriate cutoff.

Prefiltering set cutoff to 2.66 RPM in at least 6 samples. There were 1267 low read count miRNAs removed, accounting for 0.0257% (70491 reads absolute) of the total reads.

Download RPM table of prefiltering removed miRNAs

FDR based cutoff (see graph) removed 0 low read count miRNAs, accounting for 0% (0 reads absolute) of the total reads.

3.1.2 Differentially expressed miRNAs

This table shows only miRNAs that are significant differentially expressed (FDR < 0.05 ).

3.1.3 Volcano plot

This graph visualizes the relation of the logFC (how much did a miRNA change between groups) and the statistical significance of this change. Statistical significance is expressed as false discovery rate (FDR) according to Benjamini and Hochberg (Benjamini and Hochberg 1995). miRNAs higher up have a smaller FDR value, while miRNAs more to the left or right of the center, show a greater differential expression.

3.1.3.1 FDR based

3.1.3.2 p-value based

3.1.4 MA plot

MA plots visualize the relation of the mean expression (mean of expression counts in both groups on X axis = A) of a miRNA and it’s difference between the two groups (logFc on the Y axis = M). Significantly differentially expressed miRNAs (FDR < 0.05 ) are shown in red. This plot can be taken into account to check the expression levels of significantly differentially expressed miRNAs.

3.1.5 Top up- and down-regulated

Top up- and down-regulated miRNAs in the given contrast with their CPM values. CPM normalized values are calculated by edgeR and offer a more robust normalization for the calculation of differential expression than RPM. miRNAs are ordered by logFC (FDR < 0.05 only) starting with the greatest on the top left.

For miRNAs with no reads (CPM = 0) in a sample, the CPM was set to 1, so that they can be displayed in this logarithmic plot as a 0 on the y axis (as the log10 of 0 is undefined).

3.1.5.1 Top up-regulated miRNAs

3.1.5.2 Top down-regulated miRNAs

3.1.6 All miRNAs

3.1.6.1 edgeR results

This table contains the results of the differential expression analysis for all tested miRNAs. Additional TMM values calculated by edgeR are provided at the edgeR test statistics table.

3.1.6.2 edgerR test statistics

This table contains the results of edgeR’s glmQLFTest() method.

3.2 Disease_B versus Control

3.2.1 Sample overview

The following two tables give a quick overview of the samples that were part of the two groups compared in this contrast.

3.2.1.1 Samples group A

3.2.1.2 Samples group B

3.2.1.3 Independent filtering

As hundreds or even thousands of miRNAs are tested for each contrast, multiple testing adjustment is required to reduce the false discovery rate (FDR). This is traditionally done using p-value adjustment methods like Benjamini Hochberg (BH) with an arbitrary cutoff for low expressed miRNAs prior analysis. In this case, the BH method reduces the amount of false positives reliably, but at the same time, removing a great amount of valid observations. In addition, the cutoff for low expressed miRNAs might remove biologically relevant observations.

Filtering of reads should be done independent of the group assignments. This is to avoid the introduction of any bias for the downstream differential expression analysis.

In order to give the highest sensitivity to our analysis, we have implemented a method of removing low read count miRNAs from the data set until a statistically relevant set of significant results remains. This approach of independent filtering is also used by DESeq2 and provides the currently best established filtering method prior FDR adjustment. Assuming that most false-positives are caused by low abundant miRNAs, the algorithm removes quantiles of miRNAs from the low-abundance end and checks if the amount of significant miRNAs increases after BH adjustment. This would be the case if mostly false positives have been removed because BH adjustment would now be more sensitive and not remove as many true positives, increasing the overall amount of significant results.

This method works reliably as long as there are any true positive results. If the result set consist only of false positives, then even after removing the low abundant results would not increase the amount of significant results (as there are no true positives to enrich). In this case the algorithm has a fallback, to filter for lowly expressed miRNAs prior DE and FDR adjustment: In a first step, we filter out miRNAs that are only expressed on very low levels: RPM smaller than 10 divided by the smallest library size in at least half the amount of samples of the smaller group. Those miRNAs carry no biological and statistical relevance (Chen, Lun, and Smyth 2016) as they have very low read counts in both groups.

This plot visualizes the independent filtering method based on significant observations used for this contrast. The quantile of reads removed prior BH p-value adjustment is ploted on the x axis, while the amount of significant observations is shown on the y axis. The algorithm aims to optimize for the maximum amount of significant observations and picks the apropriate cutoff.

Prefiltering set cutoff to 2.79 RPM in at least 4 samples. There were 1213 low read count miRNAs removed, accounting for 0.0228% (54798 reads absolute) of the total reads.

Download RPM table of prefiltering removed miRNAs

FDR based cutoff (see graph) removed 43 low read count miRNAs, accounting for 0.0076% (18245 reads absolute) of the total reads.

3.2.2 Differentially expressed miRNAs

This table shows only miRNAs that are significant differentially expressed (FDR < 0.05 ).

3.2.3 Volcano plot

This graph visualizes the relation of the logFC (how much did a miRNA change between groups) and the statistical significance of this change. Statistical significance is expressed as false discovery rate (FDR) according to Benjamini and Hochberg (Benjamini and Hochberg 1995). miRNAs higher up have a smaller FDR value, while miRNAs more to the left or right of the center, show a greater differential expression.

3.2.3.1 FDR based

3.2.3.2 p-value based

3.2.4 MA plot

MA plots visualize the relation of the mean expression (mean of expression counts in both groups on X axis = A) of a miRNA and it’s difference between the two groups (logFc on the Y axis = M). Significantly differentially expressed miRNAs (FDR < 0.05 ) are shown in red. This plot can be taken into account to check the expression levels of significantly differentially expressed miRNAs.

3.2.5 Top up- and down-regulated

Top up- and down-regulated miRNAs in the given contrast with their CPM values. CPM normalized values are calculated by edgeR and offer a more robust normalization for the calculation of differential expression than RPM. miRNAs are ordered by logFC (FDR < 0.05 only) starting with the greatest on the top left.

For miRNAs with no reads (CPM = 0) in a sample, the CPM was set to 1, so that they can be displayed in this logarithmic plot as a 0 on the y axis (as the log10 of 0 is undefined).

3.2.5.1 Top up-regulated miRNAs

3.2.5.2 Top down-regulated miRNAs

3.2.6 All miRNAs

3.2.6.1 edgeR results

This table contains the results of the differential expression analysis for all tested miRNAs. Additional TMM values calculated by edgeR are provided at the edgeR test statistics table.

3.2.6.2 edgerR test statistics

This table contains the results of edgeR’s glmQLFTest() method.

3.3 Disease_B versus Disease_A

3.3.1 Sample overview

The following two tables give a quick overview of the samples that were part of the two groups compared in this contrast.

3.3.1.1 Samples group A

3.3.1.2 Samples group B

3.3.1.3 Independent filtering

As hundreds or even thousands of miRNAs are tested for each contrast, multiple testing adjustment is required to reduce the false discovery rate (FDR). This is traditionally done using p-value adjustment methods like Benjamini Hochberg (BH) with an arbitrary cutoff for low expressed miRNAs prior analysis. In this case, the BH method reduces the amount of false positives reliably, but at the same time, removing a great amount of valid observations. In addition, the cutoff for low expressed miRNAs might remove biologically relevant observations.

Filtering of reads should be done independent of the group assignments. This is to avoid the introduction of any bias for the downstream differential expression analysis.

In order to give the highest sensitivity to our analysis, we have implemented a method of removing low read count miRNAs from the data set until a statistically relevant set of significant results remains. This approach of independent filtering is also used by DESeq2 and provides the currently best established filtering method prior FDR adjustment. Assuming that most false-positives are caused by low abundant miRNAs, the algorithm removes quantiles of miRNAs from the low-abundance end and checks if the amount of significant miRNAs increases after BH adjustment. This would be the case if mostly false positives have been removed because BH adjustment would now be more sensitive and not remove as many true positives, increasing the overall amount of significant results.

This method works reliably as long as there are any true positive results. If the result set consist only of false positives, then even after removing the low abundant results would not increase the amount of significant results (as there are no true positives to enrich). In this case the algorithm has a fallback, to filter for lowly expressed miRNAs prior DE and FDR adjustment: In a first step, we filter out miRNAs that are only expressed on very low levels: RPM smaller than 10 divided by the smallest library size in at least half the amount of samples of the smaller group. Those miRNAs carry no biological and statistical relevance (Chen, Lun, and Smyth 2016) as they have very low read counts in both groups.

This plot visualizes the independent filtering method based on significant observations used for this contrast. The quantile of reads removed prior BH p-value adjustment is ploted on the x axis, while the amount of significant observations is shown on the y axis. The algorithm aims to optimize for the maximum amount of significant observations and picks the apropriate cutoff.

Prefiltering set cutoff to 2.79 RPM in at least 4 samples. There were 1159 low read count miRNAs removed, accounting for 0.028% (41163 reads absolute) of the total reads.

Download RPM table of prefiltering removed miRNAs

FDR based cutoff (see graph) removed 0 low read count miRNAs, accounting for 0% (0 reads absolute) of the total reads.

3.3.2 Differentially expressed miRNAs

This table shows only miRNAs that are significant differentially expressed (FDR < 0.05 ).

3.3.3 Volcano plot

This graph visualizes the relation of the logFC (how much did a miRNA change between groups) and the statistical significance of this change. Statistical significance is expressed as false discovery rate (FDR) according to Benjamini and Hochberg (Benjamini and Hochberg 1995). miRNAs higher up have a smaller FDR value, while miRNAs more to the left or right of the center, show a greater differential expression.

3.3.3.1 FDR based

3.3.3.2 p-value based

3.3.4 MA plot

MA plots visualize the relation of the mean expression (mean of expression counts in both groups on X axis = A) of a miRNA and it’s difference between the two groups (logFc on the Y axis = M). Significantly differentially expressed miRNAs (FDR < 0.05 ) are shown in red. This plot can be taken into account to check the expression levels of significantly differentially expressed miRNAs.

3.3.5 Top up- and down-regulated

Top up- and down-regulated miRNAs in the given contrast with their CPM values. CPM normalized values are calculated by edgeR and offer a more robust normalization for the calculation of differential expression than RPM. miRNAs are ordered by logFC (FDR < 0.05 only) starting with the greatest on the top left.

For miRNAs with no reads (CPM = 0) in a sample, the CPM was set to 1, so that they can be displayed in this logarithmic plot as a 0 on the y axis (as the log10 of 0 is undefined).

3.3.5.1 Top up-regulated miRNAs

3.3.5.2 Top down-regulated miRNAs

3.3.6 All miRNAs

3.3.6.1 edgeR results

This table contains the results of the differential expression analysis for all tested miRNAs. Additional TMM values calculated by edgeR are provided at the edgeR test statistics table.

3.3.6.2 edgerR test statistics

This table contains the results of edgeR’s glmQLFTest() method.

4 Summary of differential expression analysis

The direction follows the previously mentioned annotation. So “upregulated” (logFC > 0) means that the miRNA is overexpressed in the first group of the contrast.

4.1 Differentially expressed miRNAs overlaps

Overlap of significantly differentially expressed miRNAs per contrast. The upset plot displays intersections in a matrix, where rows correspond to the contrasts (sets), and the columns to the intersections between these sets. Below the plot you can find the content of the intersections in a downloadabel format.

Download data used to generate the upSet plot

5 Appendix

Generated running miND® v1.5.2 3b0263b5fcae8e03b2f5fabc3154e14235784988 on 34723f207b2d.

Changed files:


5.1 Citation

When publishing data based on the results from this report, please cite:

Diendorfer, A, K Khamina, M Pultar, and M Hackl. 2022. “miND (miRNA NGS Discovery Pipeline): A Small RNA-Seq Analysis Pipeline and Report Generator for microRNA Biomarker Discovery Studies.” F1000Research 11 (233). https://doi.org/10.12688/f1000research.94159.1.

Khamina, Kseniya, Andreas B. Diendorfer, Susanna Skalicky, Moritz Weigl, Marianne Pultar, Teresa L. Krammer, Catharine Aquino Fournier, et al. 2022. “A MicroRNA Next-Generation-Sequencing Discovery Assay (miND) for Genome-Scale Analysis and Absolute Quantitation of Circulating MicroRNA Biomarkers.” International Journal of Molecular Sciences 23 (3). https://doi.org/10.3390/ijms23031226.

5.2 Methods

The following paragraph describes the methods used for generating this report as required by most publishers. Please consider trimming it down to the parts relevant for your publication and as required by the specific journal. For additional citations please see the “References” section at the end of this report.

Next-generation sequencing (NGS) data was analyzed using the miND® analysis pipeline (Diendorfer et al. 2022): Overall quality of the NGS data was evaluated automatically and manually with fastQC v0.12 (Andrews 2010) and multiQC v1.14 (Ewels et al. 2016). Reads from all passing samples were adapter trimmed and quality filtered using cutadapt v3.3 (Martin 2011) and filtered for a minimum length of 17nt. Mapping steps were performed with bowtie v1.3.0 (Langmead et al. 2009) and miRDeep2 v2.0.1.2 (Friedländer et al. 2012), whereas reads were mapped first against the genomic reference GRCh38.p12 provided by Ensembl (Zerbino et al. 2018) allowing for two mismatches and subsequently miRBase v22.1 (Griffiths-Jones 2004), filtered for miRNAs of hsa only, allowing for one mismatch. For a general RNA composition overview, non-miRNA mapped reads were mapped against RNAcentral v23.0 (Sweeney et al. 2019) and then assigned to various RNA species of interest. Statistical analysis of preprocessed NGS data was done with R v4.0 and the packages pheatmap vNA, pcaMethods v1.82 and genefilter v1.72. Differential expression analysis with edgeR v3.32 (Robinson, McCarthy, and Smyth 2009) used the quasi-likelihood negative binomial generalized log-linear model functions provided by the package. The independent filtering method of DESeq2 (Love, Huber, and Anders 2014) was adapted for use with edgeR to remove low abundante miRNAs and thus optimize the false discovery rate (FDR) correction. Additional NGS QC and absolute quantification of miRNAs was done using miND® spike-ins (Khamina et al. 2022) based on a linear regression model.

5.3 R session information

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.0.5 (2021-03-31)
##  os       Debian GNU/Linux 11 (bullseye)
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  C
##  ctype    C.UTF-8
##  tz       Etc/UTC
##  date     2025-01-27
##  pandoc   2.19.2 @ /conda/fe3d9ca5acfdeb10baa13d69636bad4e_/bin/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package       * version   date (UTC) lib source
##  annotate        1.68.0    2020-10-27 [1] Bioconductor
##  AnnotationDbi   1.52.0    2020-10-27 [1] Bioconductor
##  Biobase       * 2.50.0    2020-10-27 [1] Bioconductor
##  BiocGenerics  * 0.36.0    2020-10-27 [1] Bioconductor
##  bit             4.0.4     2020-08-04 [1] CRAN (R 4.0.3)
##  bit64           4.0.5     2020-08-30 [1] CRAN (R 4.0.3)
##  blob            1.2.3     2022-04-10 [1] CRAN (R 4.0.5)
##  bslib           0.4.0     2022-07-16 [1] CRAN (R 4.0.5)
##  cachem          1.0.6     2021-08-19 [1] CRAN (R 4.0.5)
##  callr           3.7.2     2022-08-22 [1] CRAN (R 4.0.5)
##  cellranger      1.1.0     2016-07-27 [1] CRAN (R 4.0.5)
##  cli             3.4.1     2022-09-23 [1] CRAN (R 4.0.5)
##  colorspace      2.0-3     2022-02-21 [1] CRAN (R 4.0.5)
##  crayon          1.5.1     2022-03-26 [1] CRAN (R 4.0.5)
##  crosstalk       1.2.0     2021-11-04 [1] CRAN (R 4.0.5)
##  data.table      1.14.2    2021-09-27 [1] CRAN (R 4.0.5)
##  DBI             1.1.3     2022-06-18 [1] CRAN (R 4.0.5)
##  devtools        2.4.4     2022-07-20 [1] CRAN (R 4.0.5)
##  digest          0.6.29    2021-12-01 [1] CRAN (R 4.0.5)
##  dplyr         * 1.0.10    2022-09-01 [1] CRAN (R 4.0.5)
##  DT            * 0.17      2021-01-06 [1] CRAN (R 4.0.3)
##  edgeR         * 3.32.1    2021-01-14 [1] Bioconductor
##  ellipsis        0.3.2     2021-04-29 [1] CRAN (R 4.0.3)
##  evaluate        0.16      2022-08-09 [1] CRAN (R 4.0.5)
##  fansi           1.0.3     2022-03-24 [1] CRAN (R 4.0.5)
##  farver          2.1.1     2022-07-06 [1] CRAN (R 4.0.5)
##  fastmap         1.1.0     2021-01-25 [1] CRAN (R 4.0.3)
##  fs              1.5.2     2021-12-08 [1] CRAN (R 4.0.5)
##  genefilter    * 1.72.1    2021-01-21 [1] Bioconductor
##  generics        0.1.3     2022-07-05 [1] CRAN (R 4.0.5)
##  ggfortify     * 0.4.14    2022-01-03 [1] CRAN (R 4.0.5)
##  ggplot2       * 3.3.6     2022-05-03 [1] CRAN (R 4.0.5)
##  ggrepel       * 0.8.2     2020-03-08 [1] CRAN (R 4.0.0)
##  glue            1.6.2     2022-02-24 [1] CRAN (R 4.0.5)
##  gridExtra     * 2.3       2017-09-09 [1] CRAN (R 4.0.5)
##  gtable          0.3.1     2022-09-01 [1] CRAN (R 4.0.5)
##  highr           0.9       2021-04-16 [1] CRAN (R 4.0.3)
##  hms             1.1.2     2022-08-19 [1] CRAN (R 4.0.5)
##  htmltools       0.5.3     2022-07-18 [1] CRAN (R 4.0.5)
##  htmlwidgets     1.5.4     2021-09-08 [1] CRAN (R 4.0.5)
##  httpuv          1.6.6     2022-09-08 [1] CRAN (R 4.0.5)
##  httr            1.4.4     2022-08-17 [1] CRAN (R 4.0.5)
##  IRanges         2.24.1    2020-12-12 [1] Bioconductor
##  jquerylib       0.1.4     2021-04-26 [1] CRAN (R 4.0.3)
##  jsonlite        1.8.0     2022-02-22 [1] CRAN (R 4.0.5)
##  kableExtra    * 1.3.4     2021-02-20 [1] CRAN (R 4.0.3)
##  knitr           1.36      2021-09-29 [1] CRAN (R 4.0.5)
##  labeling        0.4.2     2020-10-20 [1] CRAN (R 4.0.5)
##  later           1.2.0     2021-04-23 [1] CRAN (R 4.0.3)
##  lattice         0.20-45   2021-09-22 [1] CRAN (R 4.0.5)
##  lazyeval        0.2.2     2019-03-15 [1] CRAN (R 4.0.5)
##  lifecycle       1.0.2     2022-09-09 [1] CRAN (R 4.0.5)
##  limma         * 3.46.0    2020-10-27 [1] Bioconductor
##  locfit          1.5-9.4   2020-03-25 [1] CRAN (R 4.0.5)
##  magrittr      * 2.0.3     2022-03-30 [1] CRAN (R 4.0.5)
##  Matrix          1.4-1     2022-03-23 [1] CRAN (R 4.0.2)
##  memoise         2.0.1     2021-11-26 [1] CRAN (R 4.0.5)
##  mime            0.12      2021-09-28 [1] CRAN (R 4.0.5)
##  miniUI          0.1.1.1   2018-05-18 [1] CRAN (R 4.0.5)
##  munsell         0.5.0     2018-06-12 [1] CRAN (R 4.0.5)
##  pcaMethods    * 1.82.0    2020-10-27 [1] Bioconductor
##  pheatmap      * 1.0.12    2019-01-04 [1] CRAN (R 4.0.5)
##  pillar          1.8.1     2022-08-19 [1] CRAN (R 4.0.5)
##  pkgbuild        1.3.1     2021-12-20 [1] CRAN (R 4.0.5)
##  pkgconfig       2.0.3     2019-09-22 [1] CRAN (R 4.0.5)
##  pkgload         1.3.0     2022-06-27 [1] CRAN (R 4.0.5)
##  plotly        * 4.9.4.1   2021-06-18 [1] CRAN (R 4.0.5)
##  plyr            1.8.7     2022-03-24 [1] CRAN (R 4.0.5)
##  prettyunits     1.1.1     2020-01-24 [1] CRAN (R 4.0.5)
##  processx        3.7.0     2022-07-07 [1] CRAN (R 4.0.5)
##  profvis         0.3.7     2020-11-02 [1] CRAN (R 4.0.3)
##  promises        1.2.0.1   2021-02-11 [1] CRAN (R 4.0.3)
##  ps              1.7.1     2022-06-18 [1] CRAN (R 4.0.5)
##  purrr           0.3.4     2020-04-17 [1] CRAN (R 4.0.3)
##  R6              2.5.1     2021-08-19 [1] CRAN (R 4.0.5)
##  RColorBrewer  * 1.1-2     2014-12-07 [1] CRAN (R 4.0.5)
##  Rcpp            1.0.9     2022-07-08 [1] CRAN (R 4.0.5)
##  readr         * 1.4.0     2020-10-05 [1] CRAN (R 4.0.5)
##  readxl        * 1.3.1     2019-03-13 [1] CRAN (R 4.0.5)
##  remotes         2.4.2     2021-11-30 [1] CRAN (R 4.0.5)
##  rjson         * 0.2.21    2022-01-09 [1] CRAN (R 4.0.5)
##  rlang           1.0.6     2022-09-24 [1] CRAN (R 4.0.5)
##  rmarkdown       2.13      2022-03-10 [1] CRAN (R 4.0.5)
##  RSQLite         2.2.8     2021-08-21 [1] CRAN (R 4.0.5)
##  rstudioapi      0.14      2022-08-22 [1] CRAN (R 4.0.5)
##  Rtsne         * 0.15      2018-11-10 [1] CRAN (R 4.0.5)
##  rvest           1.0.3     2022-08-19 [1] CRAN (R 4.0.5)
##  S4Vectors       0.28.1    2020-12-09 [1] Bioconductor
##  sass            0.4.2     2022-07-16 [1] CRAN (R 4.0.5)
##  scales          1.2.1     2022-08-20 [1] CRAN (R 4.0.5)
##  sessioninfo     1.2.2     2021-12-06 [1] CRAN (R 4.0.5)
##  shiny           1.7.2     2022-07-19 [1] CRAN (R 4.0.5)
##  stringi         1.7.8     2022-07-11 [1] CRAN (R 4.0.5)
##  stringr       * 1.4.1     2022-08-20 [1] CRAN (R 4.0.5)
##  survival        3.4-0     2022-08-09 [1] CRAN (R 4.0.5)
##  svglite         2.1.0     2022-02-03 [1] CRAN (R 4.0.5)
##  systemfonts     1.0.4     2022-02-11 [1] CRAN (R 4.0.5)
##  tibble        * 3.1.8     2022-07-22 [1] CRAN (R 4.0.5)
##  tidyr         * 1.1.4     2021-09-27 [1] CRAN (R 4.0.5)
##  tidyselect      1.1.2     2022-02-21 [1] CRAN (R 4.0.5)
##  UpSetR        * 1.4.0     2019-05-22 [1] CRAN (R 4.0.5)
##  urlchecker      1.0.1     2021-11-30 [1] CRAN (R 4.0.5)
##  usethis         2.1.6     2022-05-25 [1] CRAN (R 4.0.5)
##  utf8            1.2.2     2021-07-24 [1] CRAN (R 4.0.5)
##  vctrs           0.4.1     2022-04-13 [1] CRAN (R 4.0.5)
##  viridisLite     0.4.1     2022-08-22 [1] CRAN (R 4.0.5)
##  webshot         0.5.4     2022-09-26 [1] CRAN (R 4.0.5)
##  withr           2.5.0     2022-03-03 [1] CRAN (R 4.0.5)
##  WriteXLS      * 6.4.0     2022-02-24 [1] CRAN (R 4.0.5)
##  xfun            0.22      2021-03-11 [1] CRAN (R 4.0.3)
##  XML             3.99-0.10 2022-06-09 [1] CRAN (R 4.0.5)
##  xml2            1.3.3     2021-11-30 [1] CRAN (R 4.0.5)
##  xtable          1.8-4     2019-04-21 [1] CRAN (R 4.0.5)
##  yaml          * 2.2.2     2022-01-25 [1] CRAN (R 4.0.5)
## 
##  [1] /conda/fe3d9ca5acfdeb10baa13d69636bad4e_/lib/R/library
## 
## ──────────────────────────────────────────────────────────────────────────────

5.4 References

The following references are provided for tools used with implications on the scientific and statistical outcome of this analysis. A multitude of other tools helped in preparation of this report of which many are available as open source. Please contact us for a full list of references.

Andrews, Simon. 2010. FastQC: A quality control tool for high throughput sequence data.” https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society. Series B (Methodological) 57 (1): 289–300. http://www.jstor.org/stable/2346101.
Bushnell, Brian. 2015. BBMap.” https://sourceforge.net/projects/bbmap/.
Chen, Yunshun, Aaron T. L. Lun, and Gordon K. Smyth. 2016. From reads to genes to pathways: Differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline [version 2; referees: 5 approved].” F1000Research 5: 1–49. https://doi.org/10.12688/F1000RESEARCH.8987.2.
Diendorfer, A, K Khamina, M Pultar, and M Hackl. 2022. “miND (miRNA NGS Discovery Pipeline): A Small RNA-Seq Analysis Pipeline and Report Generator for microRNA Biomarker Discovery Studies.” F1000Research 11 (233). https://doi.org/10.12688/f1000research.94159.1.
Ewels, Philip, Måns Magnusson, Sverker Lundin, and Max Käller. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report.” Bioinformatics 32 (19): 3047–48. https://doi.org/10.1093/bioinformatics/btw354.
Friedländer, Marc R., Sebastian D. MacKowiak, Na Li, Wei Chen, and Nikolaus Rajewsky. 2012. MiRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades.” Nucleic Acids Research 40 (1): 37–52. https://doi.org/10.1093/nar/gkr688.
Griffiths-Jones, S. 2004. The microRNA Registry.” Nucleic Acids Research 32 (90001): 109D–111. https://doi.org/10.1093/nar/gkh023.
Huber, Wolfgang, Vincent J Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S Carvalho, Hector Corrada Bravo, et al. 2015. Orchestrating high-throughput genomic analysis with Bioconductor.” Nature Methods 12 (2): 115–21. https://doi.org/10.1038/nmeth.3252.
Khamina, Kseniya, Andreas B. Diendorfer, Susanna Skalicky, Moritz Weigl, Marianne Pultar, Teresa L. Krammer, Catharine Aquino Fournier, et al. 2022. “A MicroRNA Next-Generation-Sequencing Discovery Assay (miND) for Genome-Scale Analysis and Absolute Quantitation of Circulating MicroRNA Biomarkers.” International Journal of Molecular Sciences 23 (3). https://doi.org/10.3390/ijms23031226.
Köster, Johannes, and Sven Rahmann. 2012. Snakemake-a scalable bioinformatics workflow engine.” Bioinformatics 28 (19): 2520–22. https://doi.org/10.1093/bioinformatics/bts480.
Langmead, Ben, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.” Genome Biology 10 (3). https://doi.org/10.1186/gb-2009-10-3-r25.
Li, Heng, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, and Richard Durbin. 2009. The Sequence Alignment/Map format and SAMtools.” Bioinformatics 25 (16): 2078–79. https://doi.org/10.1093/bioinformatics/btp352.
Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome Biology 15 (12): 1–21. https://doi.org/10.1186/s13059-014-0550-8.
Maaten, Laurens van der, and Geoffrey Hinton. 2008. Visualizing High-Dimensional Data Using t-SNE.” Journal of Machine Learning Research 9 9 (August): 2579–2605.
Martin, Marcel. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads.” EMBnet.journal 17 (1): 10. https://doi.org/10.14806/ej.17.1.200.
McCarthy, Davis J., Yunshun Chen, and Gordon K. Smyth. 2012. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation.” Nucleic Acids Research 40 (10): 4288–97. https://doi.org/10.1093/nar/gks042.
Pantano, Lorena, Marc R. Friedländer, Georgia Escaramís, Esther Lizano, Joan Pallarès-Albanell, Isidre Ferrer, Xavier Estivill, and Eulàlia Martí. 2016. Specific small-RNA signatures in the amygdala at premotor and motor stages of Parkinson’s disease revealed by deep sequencing analysis.” Bioinformatics 32 (5): 673–81. https://doi.org/10.1093/bioinformatics/btv632.
Robinson, Mark D., Davis J. McCarthy, and Gordon K. Smyth. 2009. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data.” Bioinformatics 26 (1): 139–40. https://doi.org/10.1093/bioinformatics/btp616.
Stacklies, Wolfram, Henning Redestig, Matthias Scholz, Dirk Walther, and Joachim Selbig. 2007. pcaMethods - A bioconductor package providing PCA methods for incomplete data.” Bioinformatics 23 (9): 1164–67. https://doi.org/10.1093/bioinformatics/btm069.
Sweeney, Blake A., Anton I. Petrov, Boris Burkov, Robert D. Finn, Alex Bateman, Maciej Szymanski, Wojciech M. Karlowski, et al. 2019. RNAcentral: A hub of information for non-coding RNA sequences.” Nucleic Acids Research 47 (D1): D221–29. https://doi.org/10.1093/nar/gky1034.
Zerbino, Daniel R., Premanand Achuthan, Wasiu Akanni, M. Ridwan Amode, Daniel Barrell, Jyothish Bhai, Konstantinos Billis, et al. 2018. Ensembl 2018.” Nucleic Acids Research 46 (D1): D754–61. https://doi.org/10.1093/nar/gkx1098.